Robust Crowd Labeling Using Little Expertise

نویسندگان

  • Faiza Khan Khattak
  • Ansaf Salleb-Aouissi
چکیده

Crowd-labeling emerged from the need to label large-scale and complex data, a tedious, expensive, and time-consuming task. But the problem of obtaining good quality labels from a crowd and their integration is still unresolved. To address this challenge, we propose a new framework that automatically combines and boosts bulk crowd labels supported by limited number of “ground truth” labels from experts. The ground truth labels help to estimate the individual expertise of crowd labelers and difficulty of each instance, both of which are used to aggregate the labels. We show through extensive experiments that unlike other state-of-the-art approaches, our method is robust even in the presence of a large proportion of bad labelers in the crowd. We derive a lower bound on the number of expert labels needed to judge crowd and dataset as well as to get better quality labels.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quality Control of Crowd Labeling through Expert Evaluation

We propose a general scheme for quality-controlled labeling of large-scale data using multiple labels from the crowd and a “few” ground truth labels from an expert of the field. Expert-labeled instances are used to assign weights to the expertise of each crowd labeler and to the difficulty of each instance. Ground truth labels for all instances are then approximated through those weights along ...

متن کامل

Toward a Robust Crowd-labeling Framework using Expert Evaluation and Pairwise Comparison

Crowd-labeling emerged from the need to label large-scale and complex data, a tedious, expensive, and time-consuming task. One of the main challenges in the crowd-labeling task is to control for or determine in advance the proportion of low-quality/malicious labelers. If that proportion grows too high, there is often a phase transition leading to a steep, non-linear drop in labeling accuracy as...

متن کامل

Sembler: Ensembling Crowd Sequential Labeling for Improved Quality

Many natural language processing tasks, such as named entity recognition (NER), part of speech (POS) tagging, word segmentation, and etc., can be formulated as sequential data labeling problems. Building a sound labeler requires very large number of correctly labeled training examples, which may not always be possible. On the other hand, crowdsourcing provides an inexpensive yet efficient alter...

متن کامل

Toward a Robust and Universal Crowd-Labeling Framework

One of the main challenges in crowd-labeling is to control for or determine in advance the proportion of low-quality/malicious labelers. We propose methods that estimate the labeler and data instance related parameters using frequentist and Bayesian approaches. All these approaches are based on expert-labeled instance (ground truth) for a small percentage of data to learn the parameters. We als...

متن کامل

Using Expertise for Crowd-Sourcing

In this paper, we examine whether the use of expertise ratings can help crowd-sourcing systems. We show, using simulations, that a crowd-sourcing system based in social navigation works better when users’ expertise levels are taken into account.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013